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Abstract — In this paper we study the compressed sensing prob- 
lem of recovering a sparse signal from a system of underdeter- 
mined hnear equations when we have prior information about the 
probability of each entry of the unknown signal being nonzero. In 
particular, we focus on a model where the entries of the unknown 
vector fall into two sets, each with a different probability of 
being nonzero. We propose a weighted li minimization recovery 
algorithm and analyze its performance using a Grassman angle 
approach. We compute explicitly the relationship between the 
system parameters (the weights, the number of measurements, 
the size of the two sets, the probabilities of being non-zero) 
so that an iid random Gaussian measurement matrix along 
with weighted minimization recovers almost all such sparse 
signals with overwhelming probability as the problem dimension 
increases. This allows us to compute the optimal weights. We 
also provide simulations to demonstrate the advantages of the 
method over conventional ti optimization. 

I. Introduction 

Compressed sensing is an emerging technique of joint 
sampling and compression that has been recently proposed as 
an alternative to Nyquist sampling (followed by compression) 
for scenarios where measurements can be costly [14]. The 
whole premise is that sparse signals (signals with many zero 
or negligible elements in a known basis) can be recovered 
with far fewer measurements than the ambient dimension 
of the signal itself. In fact, the major breakthrough in this 
area has been the demonstration that £i minimization can 
efficiently recover a sufficiently sparse vector from a system 
of underdetermined linear equations [2]. 

The conventional approach to compressed sensing assumes 
no prior information on the unknown signal other than the 
fact that it is sufficiently sparse in a particular basis. In many 
applications, however, additional prior information is available. 
In fact, in many cases the signal recovery problem (which 
compressed sensing attempts to address) is a detection or 
estimation problem in some statistical setting. Some recent 
work along these lines can be found in [5] (which considers 
compressed detection and estimation) and [6] (on Bayesian 
compressed sensing). In other cases, compressed sensing may 
be the inner loop of a larger estimation problem that feeds prior 
information on the sparse signal (e.g., its sparsity pattern) to 
the compressed sensing algorithm. 

In this paper we will consider a particular model for the 
sparse signal that assigns a probability of being zero or 
nonzero to each entry of the unknown vector The standard 
compressed sensing model is therefore a special case where 



these probabilities are all equal (for example, for a fc-sparse 
vector the probabilities will all be -, where n is the number 
of entries of the unknown vector). As mentioned above, there 
are many situations where such prior information may be 
available, such as in natural images, medical imaging, or in 
DNA microarrays where the signal is often block sparse, i.e., 
the signal is more likely to be nonzero in certain blocks rather 
than in others [7]. 

While it is possible (albeit cumbersome) to study this model 
in full generality, in this paper we will focus on the case where 
the entries of the unknown signal fall into two categories: 
in the first set (with cardinality ni) the probability of being 
nonzero is Pi, and in the second set (with cardinality n2 ~ 
n — Tii) this probability is (Clearly, in this case the sparsity 
will with high probability be around niPi+n2P2-) This model 
is rich enough to capture many of the salient features regarding 
prior information, while being simple enough to allow a very 
thorough analysis. While it is in principle possible to extend 
our techniques to models with more than two categories of 
entries, the analysis becomes increasingly tedious and so is 
beyond the scope of this short paper 

The contributions of the paper are the following. We propose 
a weighted £i minimization approach for sparse recovery 
where the £i norms of each set are given different weights Wi 
(i = 1, 2). Clearly, one would want to give a larger weight to 
those entries whose probability of being nonzero is less (thus 
further forcing them to be zero)Q The second contribution 
is to compute explicitly the relationship between the pi, the 
Wi, the — , i — 1,2 and the number of measurements so 
that the unknown signal can be recovered with overwhelming 
probability as n cx) (the so-called weak threshold) for 
measurement matrices drawn from an iid Gaussian ensemble. 
The analysis uses the high-dimensional geometry techniques 
first introduced by Donoho and Tanner [1], [3] (e.g., Grassman 
angles) to obtain sharp thresholds for compressed sensing. 
However, rather than use the neighborliness condition used 
in [1], [3], we find it more convenient to use the null space 
characterization of Xu and Hassibi [4], [13]. The resulting 
Grassmanian manifold approach is a general framework for 
incorporating additional factors into compressed sensing: in 

' A somewhat related method that uses weighted £i optimization is Candes 
et al [8]. The main difference is that there is no piior information and at 
each step the li optimization is re-weighted using the estimates of the signal 
obtained in the last minimization step. 



[4] it was used to incorporate measurement noise; here it 
is used to incorporate prior information and weighted £i 
optimization. Our analytic results allow us to compute the 
optimal weights for any pi, p2, ni, n2- We also provide 
simulation results to show the advantages of the weighted 
method over standard £i minimization. 

II. Model 

The signal is represented by a n x 1 vector x = 
{xi,X2, Xn)^ of real valued numbers, and is non- uniformly 
sparse with sparsity factor Pi over the (index) set A'l C 
{l,2,..n} and sparsity factor P2 over the set K2 = 
{1,2, \ Ki. By this, we mean that if j G Ki, Xi is a 

nonzero element with probability Pi and zero with probability 
1 — Pi . However, if i G K2 the probability of Xi being nonzero 
is P2- We assume that |A'i| = rii and |A'2| =112 = n ~ ni. 
The measurement matrix Aisamxn(— =(5<1) matrix 
with i.i.d A/^(0, 1) entries. The observation vector is denoted 
by y and obeys the following: 



Ax 



(1) 



As mentioned in Section U ^i-minimization can recover a 
vector X with k = 1.111 non-zeros, provided yu is less than a 
known function of S. £1 minimization has the following form; 



mm I 

Ax=y 



(2) 



(|2l) is a linear programming and can be solved polynomially 
fast {0{n'^)). However, it fails to encapsulate additional prior 
information of the signal nature, might there be any such 
information. One might simply think of modifying (|2| to a 
weighted £1 minimization as follows: 
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Ax=y 



|x||wi = min Wi\xi 

Ax=y ^ — ' 
i—1 



(3) 



The index w is an indication of the n x 1 positive weight 
vector Now the question is what is the optimal set of weights, 
and can one improve the recovery threshold using the weighted 
£1 minimization of (O with those weights rather than (|2]i? We 
have to be more clear with the objective at this point and what 
we mean by extending the recovery threshold. First of all note 
that the vectors generated based on the model described above 
can have any arbitrary number of nonzeros. However, their 
support size is typically (with probability arbitrary close to 
one) around niPi+n2P2)- Therefore, there is no such notion 
of strong threshold as in the case of [1]. We are asking the 
question of for what Pi and P2 signals generated based on 
this model can be recovered with overwhelming probability 
as n ^ cxD. Moreover we are wondering if by adjusting w/s 
according to Pi and P2 can one extend the typical sparsity 
to dimension ratio (liiii+liaii) for which reconstruction is 
successful with high probability. This is the topic of next 
section. 



III. Computation of the Weak Threshold 

Because of the partial symmetry of the sparsity of the signal 
we know that the optimum weights should take only two 
positive values Wi and W2. In other word^ 



yi G {1,2,... ,n} 



VFi if i G Ki 
if i G K2 



Let X be a random sparse signal generated based on the 
non-uniformly sparse model of section |ll] and be supported on 
the set K. K is called e-typical if \\K n A'i| — niPi\ < en 
and 1 1 A' n A'2| — ^2^2! < Let E be the event that x is 
recovered by (O. Then: 

P [E"] = PlE^lK is e-typical] P [K is e-typical] 

+ P [£"'|A: not e-typical] P [K not e-typical] 

For any fixed e > P [AT not e-typical] will exponentially 
approach zero as 7i grows according to the law of large 
numbers. So, to bound the probability of failed recovery we 
may assume that K is e-typical for any small enough e. 
Therefore we just consider the case \K\ = k = niPi + n2P2- 
Similar to the null-space condition of [13], we present a 
necessary and sufficient condition for x to be the solution 
to (O. It is as follows: 

yz G N{A) ^'1^' I < I] w^d^d 



ieK 



Where Af{A) denotes the right nullspace of A. We can 
upper bound P {E'^) with Pk.- which is the probability that 
a vector x of a specific sign pattern (say non-positive) and 
supported on the specific set K is not recovered correctly 
by (|3]l (A difference between this upper bound and the one 
in [4] is that here there is no (^') 2'^ factor, and that is because 
we have fixed the support set K and the sign pattern of x). 
Exactly as done in [4], by restricting x to the cross-poly tope 
{x G R" I ||x||wi = 1}IE and noting that x is on a (fc — 1)- 
dimensional face F of the skewed cross-polytope SP = {y G 
-R" I ||y||wi < 1}, Pk,- is essentially the probability that 
a uniformly chosen (n — m)-dimensional subspace ^ shifted 
by the point x, namely (5" + x), intersects SP nontrivially 
at some other point besides x. Pk,- is then interpreted as 
the complementary Grassmann angle [9] for the face F with 
respect to the polytope SP under the Grassmann manifold 
Gr(n — m){n). Building on the works by L.A.Santalo [11] and 
P.McMullen [12] etc. in high dimensional integral geometry 
and convex polytopes, the complementary Grassmann angle 
for the (fc — 1) -dimensional face F can be explicitly expressed 
as the sum of products of internal angles and external angles 
[10]: 
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/3(P,G)7(G,SP), 



(4) 



s>0GGa„ + i+2s(SP) 



-Also we may assume WLG that Wi = 1 

^^This is because the restricted polytope totally surrounds the origin in 



where s is any nonnegative integer, G is any (m + 1 + 2s)- 
dimensional face of the skewed crosspolytope (3,„_|-i+2s(SP) 
is the set of all such faces), /?(■, ) stands for the internal angle 
and 7(-, •) stands for the external angle. The internal angles 
and external angles are basically defined as follows [10] [12]: 

• An internal angle (3{Fi, F2) is the fraction of the hyper- 
sphere S covered by the cone obtained by observing the 
face F2 from the face Fi. The internal angle (3{Fi,F2) 
is defined to be zero when Fi ^ F2 and is defined to be 
one if Fi = F2. 

• An external angle 7(^3,^4) is the fraction of the hy- 
persphere S covered by the cone of outward normals to 
the hyperplanes supporting the face F4 at the face F3. 
The external angle 7(^3, F4) is defined to be zero when 
F3 ^ F4 and is defined to be one if ^3 = ^4. 

Note that F here is a typical face of SP corresponding to 
a typical set K. P{F, G) depends not only on the dimension 
of the face G, but also depends on the number of its vertices 
supported on Ki and if 2. In other words if G is supported on a 
set L, then (3{F, G) is only a function of \LnKi\ and |Ln/\2|- 
So we write P{F,G) = (i{tiM) and similarly 7(G, S'P) = 
7(^1,^2) where ti = \Lr]Ki\-niPi and i2 = \Lr\K2\-n2P2- 
Combining the notations and counting the number of faces G, 
^ leads to: 



p (£:=) < 

E 

< tl < (1 - -Pl)ni 
< t2 < (1 - F2)^2 
tl + t2 > m - fc + 1 



ltl+t2 



(l-Pi)ni 

tl 



:i - P2)n2 



P{hM)l{tiM) + 0(e-^") (5) 

for some c > 0. As n ^ cx) each term in Q behaves 
like exp{ml)com{ti,t2) - nipint{ti,t2) - nil)ext{ti,t2)} where 
^com i^int and ij^ext are the combinatorial exponent, the 
internal angle exponent and the external angle exponent of the 
each term respectively. It can be shown that the necessary and 
sufficient condition for (|5]l to tend to zero is that ^{ti,t2) = 
'fpco7n{ti,t2)-iptnt{ti,t2)-ipext{ti,t2) be Uniformly negative 
for all tl and t2 in Q. 

In the following sub-sections we will try to evaluate the 
internal and external angles for a typical face F, and a face 
G containing Fand try to give closed form upper bounds 
for them. We combine the terms together and compute the 
exponents using Laplace method in section |IV] and derive 
thresholds for nonnegativity of the cumulative exponent using. 

A. Derivation of the Internal Angles 

Suppose that F is a typical {k — 1) -dimensional face of the 
skewed cross-polytope 



SP = {y e R" I llyllwi 



i=l 



W^\y^\ < 1} 



supported on the subset K with = fc « niPi +n2P2- Let 
G be a ? — 1 dimensional face of SP supported on the set L 
with F CG. Also, let \L n Ki \ = ti and \L n K2 \ = t2- 



First we can prove the following lemma: 
Lemma 1: Let Conp± q be the positive cone of all the 
vectors x S M" that take the form: 



X e,; 



(6) 



1=1 i=k+l 

where bi,l < i < I are nonnegative real numbers and 

k I 



i=l i=k+l 

Then 



bi &2 



Wl 'W2 



bk 

Wk 



I 

J Con, 



e-ll''ll'dx = /3(F,G)V^_fc_i(5'-^-i) 



X / e-''\^-''-^ dx = P{F, G) ■ Tr^'-'^)/^ (7) 
Jo 

where V;-fe-i (5'^'"'^^) is the spherical volume of the {l — k — 
1) -dimensional sphere 5'^'"'^^. 

Proof: Omitted for brevity ■ 
From (|7]) we can find the expression for the internal angle. 
Define U C M'^'^+^ as the set of all nonnegative vectors 
(xi,X2, • • ■ ,xi-k+i) satisfying: 

^p>o, i<p<i-k + i (Ep^i wl)xi = Ep=fc+i wlxp_u+i 

and define /(.xi, • • • , xi-k+i) ■ U Conp± q to be the 
hnear and bijective map 

k I 

• ■ • , x/^fc+i) = -^xiifpep + ^ Xp_k+iWpep 

p—l p—k+l 

Then 



Con, 



JU 



= \J{A)\ / e'll^(^)ll'dx2---rfx,_fe+i 



= \J{A)\ J e (2:p=r"';)^i ^P=k+i'^p^p-k+i f^x2- ■ ■ dxi-k+i 

(8) 

r is the region described by 

k I 

("^wl)xi^ wlxp-k+uxp> 02 <p<l-k+l (9) 

p=l p=k+l 

where |J(A)| is due to the change of integral variables and 
is essentially the determinant of the Jacobian of the variable 
transform given by th& I x I — k matrix A given by: 



-fi'Wiwl_^_j 1 < i < k,l < j < I — k 
Ai^j = ^ Wi k + 1 <i <l,j = i- k (10) 

Otherwise 



Fig. 1: Recoverable Pi threshold as a function of W2. P2 — 0.1, m — 0.75n 



Fig. 3: Successful recovery percentage for different weights. P2 — 0.1 and m — 
0.75n 



where = ELi tfp- Now \J{A)\ = ^det(A^A). By 

finding the eigenvalues of A^A we obtain: Then the outward normal cone c(G, SP) at the face G is 

the positive hull of these normal vectors. Thus 

Now we define a random variable c(g, sp) , , , , " 

I =7(G,SP).^("-'+l)/^ (14) 

Z = C^u}p)Xi - ^ WpXp^k+i where y„_;(5"~') is the spherical volume of the {n - l)- 

p=fe+i dimensional sphere S""^'. Now define U to be the set 

where Xi, X^, ■ • ■ , are independent random vari- |^ ^ i^-'+i | > 0, < x„_,+i, 1 < < (n-?)} 



ables, with Xp ~ i?iV(0, 3^;;,^ ), 2 < p < {I - k 

1), as half-normal distributed random variables and Xi - and define f{xi, •■• , x„_,+i) : U c(G, SP) to be the 

^(0, Ty^r 2) as a normal distributed random variable. Then linear and bijective map 

by inspection, ^ is equal to " 

where pz{-) is the probability density function for the random Then 
variable Z and ^2(0) is the probability density function 

evaluated at the point Z = 0, and / e~"^'"° dx' = \ JiA)\ / e""-'^'"''"^ dx 



r-l-k+l I 

' 71" TT J- 



/ 1|2 

e ""^ 

c(G, SP) 



2i-fc 11 w \ ^ 
^^^jT^^(niPi+ti)W2 + (n2P2+t2)Vy| (12) 



/■oo 

IJ(A)I / , 

JO 



X 



Combining and ©: / /-^i- _2 \ / /-w^. _ x(i-p2)«2-*2 

e dj/ 1 I / e dy J da; 



/3{ti,t2) = TT—CpziO) (13) 



= 2"-' I e""" I / ' e-'^'dy] ( [ e^'^'dy) dx 



B. Derivation of the External Angle 

Without loss of generality, assume K ~ {n — fc + 1, • • ■ (15) 
Consider the (/ — 1) -dimensional face 



where ^ = C(*i,i2) = \/Ei=„-i+iW?. ri = (l-Pi)7ii-fi 7-2 = 
G = conv{ , , , — } (1 _ P2)„2 _ t2. \ J{A)\ = ^det{A^A) = ^ is resulting from 



Wn-l + l Wn-k Wn-k+l Wn 



the change of variable in the integral. 



of the skewed cross-polytope SP. The 2"^' outward normal 

vectors of the supporting hyperplanes of the facets containing EXPONENT CALCULATION 

G are given by Using the Laplace method we compute the angle exponents. 

„ They are given in the following theorems, the proofs of which 

|^~^ j^y^.g. _^ Wiei ji G { — 1 1}} are omitted for brevity, we assume ni = 71 n, n2 = 72^ and 



WLG Wi ^l,W2 = W. 




(a) (b) 
Fig. 2: Successful recovery percentage for weighted £i minimization with different weights and suboptimal weights in a nonuniform sparse setting. P2 — 0.05 and m — 0.5n 



Theorem 1: Let ti = t'lTi, t2 ~ tjTi, g{x) = -^e ^2 , 

G(a;) = ^ ^^e-y^dy. Also define C = (t'l + 71 Pi) + 

+ 72'f'2), £>! = 71(1 - Pi) - A and D2 = 72(1 - 
P2) — t'2- Let xq be the unique solution to x of the following: 

_ g(x)i^i _ Wg{Wx)D2 ^ 
xG{x) xG{Wx) 

Then 

i^ext(ti,t2) = Cxl-D^\ogG{xo) - 7^2logG(VFxo) (16) 

Theorem 2: Let 6 *^+^/^*" and i^(.) and $(.) be the 

standard Gaussian pdf and cdf functions respectively. Also 

let 0(3) - I- ^^^vi^^) Define the function 

let <^yb) — (^ti+t2)'i-(s) + (ti+t2)*(Ws)- ^'="11'= "it^ iuncLion 

M(s) = ^of^ and solve for s in Af(s) = Let the 

unique solution be s* and set y = s*{b — Jj^T^)- Compute 
the rate function A*{y) ^ sy - j^Aiis) - j^Ai{Ws) 
at the point s = s*, where Ai(s) = ^ + '^og{2ip{s)). The 
internal angle exponent is then given by: 

Ant{hM) = (A*(2/) + ^y + \og2){t[ + t'2) (17) 
As an illustration of these results, for P2 = 0.1 and 
(5 = ^ = 0.75 using Theorems |2] and [T] and combining the 
exponents with the combinatorial exponent, we have calculated 
the threshold for Pi for different values of W2 in the range 
[1,3] , below which the signal can be recovered. The curve 
is depicted in Figure [T] As expected, the curve is suggesting 
that in this setting weighted £1 minimization boosts the weak 
threshold in comparison with £1 minimization. This is verified 
in the next section by some examples. 

V. Simulation 

We demonstrate by some examples that appropriate weights 
can boost the recovery percentage. We fix P2 and n = 2m = 
200, and try li and weighted li minimization for various 
values of Pi. We choose rii = ?i2 = f Figure |2a| shows 
one such comparison for P2 = 0.05 and different values of 
W2- Note that the optimal value of W2 varies as Pi changes. 
Figure|2b]illustrates how the optimal weighted li minimization 
surpasses the ordinary ti minimization. The optimal curve 



is basically achieved by selecting the best weight of Figure 
l2al for each single value of Pi . Figure |3] shows the result of 
simulations in another setting where P2 = 0.1 and m = 0.75?! 
(similar to the setting of the previous section). It is clear from 
the figure that the recovery success threshold for Pi has been 
shifted higher when using weighted li minimization rather 
than standard £1 minimization. Note that this result very well 
matches the theoretical result of Figure [T] 
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